Efficient SQL-Querying Method for Data Mining in Large Data Bases
نویسنده
چکیده
Data mining can be understood as a process of extraction of knowledge hidden in very large data sets. Often data mining techniques (e.g. discretization or decision tree) are based on searching for an optimal part i t ion of data wi th respect to some optimization criterion. In this paper, we investigate the problem of optimal binary part i t ion of continuous attr ibute domain for large data sets stored in relational data bases (RDB). The critical for t ime complexity of algorithms solving this problem is the number of simple SQL queries like SELECT COUNT FROM ... WHERE attribute BETWEEN ... (related to some interval of attr ibute values) necessary to construct such partitions. We assume that the answer t ime for such queries does not depend on the interval length. Using straightforward approach to optimal partit ion selection (with respect to a given measure), the number of necessary queries is of order O(N), where N is the number of preassumed part itions of the searching space. We show some properties of considered optimization measures, that allow to reduce the size of searching space. Moreover, we prove that using only O(logiV) simple queries, one can construct the parti t ion very close to optimal.
منابع مشابه
Querying Hierarchical Data in Very Large Databases
Hierarchical data, such as Partially Ordered Set (POSET) is tremendously used in relational databases, especially in data mining and data warehouse based-applications. Unfortunately, SQL (Structured Query Language) does not effectively support hierarchical data structure to manage this sort of data, for example, in Oracle, a CONNECT BY operator is used to query data organized into trees, howeve...
متن کاملPreparing Data Sets for the Data Mining Analysis using the Most Efficient Horizontal Aggregation Method in SQL
A huge amount of time is needed for making the dataset for the data mining analysis because data mining practitioners required to write complex SQL queries and many tables are to be joined to get the aggregated result. The traditional SQL aggregations prepare the data sets in vertical layout that is; they return result on one column per aggregated group. But for the data mining project, the dat...
متن کاملSQL based frequent pattern mining
Data mining on large relational databases has gained popularity and its significance is well recognized. However, the performance of SQL based data mining is known to fall behind specialized implementation since the prohibitive nature of the cost associated with extracting knowledge, as well as the lack of suitable declarative query language support. Frequent pattern mining is a foundation of s...
متن کاملImproving Analysis Of Data Mining By Creating Dataset Using Sql Aggregations
In Data mining, an important goal is to generate efficient data. Efficiency and scalability have always been important con-cerns in the field of data mining. The increased complexity of the task calls for algorithms that are inherently more expensive. To analyze data efficiently, Data mining systems are widely using datasets with columns in horizontal tabular layout. Preparing a data set is mor...
متن کاملCaching for Multi-dimensional Data Mining Queries
Multi-dimensional data analysis and online analytical processing are standard querying techniques applied on today’s data warehouses. Data mining algorithms, on the other hand, are still mostly run in stand-alone, batch mode on flat files extracted from relational databases. In this paper we propose a general querying model combining the power of relational databases, SQL, multidimensional quer...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 1999